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Abstract 

Among  the  numerous  problems  which  arise  in  the  context  of  radar  signal  pro¬ 
cessing  is  the  problem  of  extraction  of  information  from  a  noise  corrupted  signal. 
In  this  application  the  signal  is  assumed  to  be  the  superposition  of  outputs  from 
multiple  radar  emitters.  Associated  with  the  output  of  each  emitter  is  a  unique  set 
of  parameters  which  are  in  general  unknown.  Significant  parameters  associated  with 
each  emitter  are  (i)  the  pulse  repetition  frequencies,  (ii)  the  pulse  durations  (widths) 
associated  with  pulse  trains  and  (iii)  the  pulse  amplitudes.  A  superposition  of  the 
outputs  of  multiple  emitters  together  with  additive  noise  is  observed  at  the  receiver. 
In  this  study  we  consider  the  problem  of  decomposing  such  a  noise  corrupted  lin¬ 
ear  combination  of  emitter  outputs  into  an  underlying  set  of  basis  signals  while  also 
identifying  the  parameters  associated  with  each  of  the  emitters  involved.  Foremost 
among  our  objectives  is  to  design  a  system  capable  of  performing  this  decomposi¬ 
tion/classification  in  a  demanding  real-time  environment. 

We  present  here  a  system  composed  of  three  cascaded  neural-analog  networks 
which,  in  simulation,  has  demonstrated  an  ability  to  nominally  perform  the  task  of 
decomposition  and  classification  of  superposed  radar  signals  under  extremely  high 
noise  conditions. 


1  Introduction 


Addressed  here  is  the  problem  of  decomposing  a  linear  combination  (or  superposition) 
of  basis  signals  into  its  underlying  components  under  the  constraint  that  the  superposition 
has  been  noise  corrupted.  We  emphasize  the  fact  that  the  basis  functions  need  not  be 
known  apriori.  Clearly  a  problem  of  this  nature  has  direct  applications  in  practical  radar 
systems. 

For  instance,  consider  the  scenario  in  which  the  radar  environment  is  crowded  with 
signals  from  a  multitude  of  radar  emitters.  Each  emitter  propagates  its  unique  parametric 
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representation  of  its  characteristic  signal  through  the  environment  independently  of  the 
others.  It  is  desirable  to  posseses  the  capability  of  determining  both  the  number  and 
identity  of  the  emitters  present.  This  type  of  identification  or  classification  is  probably 
most  relevant  in  military  applications  where  it  is  imperative  to  differentiate  between 
friendly  and  hostile  radars. 

For  a  system  designed  for  such  an  application  to  be  practical,  it  is  clearly  necessary 
that  intensive  processing  be  performed.  Hence,  foremost  among  our  objectives  is  the 
design  of  a  system  capable  of  performing  this  decomposition  in  a  demanding  real-time 
environment.  In  this  respect,  conventional  digital  hardware  implementations  are  not 
likely  to  succeed.  For  this  reason,  we  focus  attention  on  an  analog  neural  network  solution 
instead.  By  exploiting  the  parallel  nature  of  the  neural  network  architecture,  it  is  possible 
to  far  exceed  the  speed  of  an  equivalent  digital  implementation.  Moreover,  the  neural 
topology  eliminates  the  need  for  an  explicit  algorithmic  development. 

Among  the  specifications  for  such  a  system  are  good  estimation  of  the  weights  of  the 
underlying  basis  functions  when  the  input  signal  has  been  severly  degraded  by  noise. 
It  will  be  shown  that  the  proposed  system  employs  a  non  linear  neural  network  noise 
reduction  network  which  performs  demonstrably  better  than  a  simple  linear  low  pass 
filter.  Non-linear  noise  reduction  aides  the  classification  process  aiding  satisfaction  of  the 
aforementioned  specifications. 

2  Problem  Statement 


Figure  1  indicates  the  model  for  the  Radar  Decomposition  System.  For  analytical 
convenience  this  model  is  transformed  by  reflecting  the  sampling  process  all  the  way 
back  to  the  introduction  of  the  basis  functions  (see  Figure  2).  Such  a  transformation  has 
the  advantage  of  allowing  the  problem  formulation  to  be  carried  out  solely  in  discrete 
time.  Also,  a  realistic  signal  processing  application  dictates  that  only  a  window  (a  finite 
number  of  samples)  of  a  signal  may  be  processed  at  any  point  in  time.  The  number  of 
points  in  this  window  is  assumed  to  be  fixed  and  is  denoted  as  n.  Thus,  the  domain  of 
the  signal  can  be  defined  as  1NS  =  {  0, 1, . . . ,  n  —  1  }. 

Suppose  there  exits  a  finite  set  of  linearly  independent  discrete  time  basis  functions, 
B  =  {  (j>{ o,©0},  •••  A{N- l.Qjv-i}  }■  (2.1) 

Here  (f>^k<&ky  :  INa  t-r  1R  is  a  member  of  the  basis  set  and  ©&  €  k  =  0,1,..., A  —  1 
is  an  M  dimensional  parameter  vector  indexing  (f>.  Unless  otherwise  indicated,  N  will  for 
the  remainder  of  this  paper  refer  to  the  number  of  basis  functions  in  the  basis  set  B. 

A  signal1  s,  is  generated  as  a  linear  combination  of  the  basis  set,  where  the  corre¬ 
sponding  weight  vector,  a  =  (ao>  a\,  •  •  •  ,  a at- i)T  G  IR^.  That  is, 

N- 1 

s:KswR9s=  ak(j>{k,Bk}  -  a T$(B)  (2.2) 

0 

xNote  that  the  signal  s  is  actually  the  sampled  signal  obtained  from  the  superposition  of  the  continuous 
basis  functions,  however,  we  have  assumed  that  basis  signals  themselves  have  already  been  sampled. 
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OR 


N- 1 

*(*)  =  X3  ak<f>{k,6k}(t),  t  €  IN,  (2.3) 

/c=0 

$(#)  is  the  JV-dimensional  vector  formed  by  the  basis  functions  given  as 

$(#)  =  Of o,0o}>  ^{i,0i}>  •••  O{AT-i,0w_1})T-  (2.4) 

Note  that  the  linear  independence  of  the  basis  set,  B ,  insures  that  the  signal,  s,  has 
an  unambiguous  (unique)  representation  with  respect  to  the  basis  set.  Formally,  a  finite 
set  B  consisting  of  elements  </>&,  k  —  0, 1, . . .,  N  —  1,  has  linearly  independent  elements  if 
the  following  condition  is  satisfied: 


OR 


N- 1 

^  Sy  ftk&k  ~  o 

A;=0 


ak  =  0,  k  —  0, 1, . . . ,  N  —  1 


(2.5) 


aT$(P)  =  0  O  a  =  0  (2.6) 

A  second  signal,  S',  is  produced  by  simply  adding  Gaussian  noise  of  mean  p  and 
variance  a2.  Thus  S  is  the  signal  that  is  presented  to  the  decomposition  network,  where 

s(t)  =  s(t)  +  n(t),  n(t)  ~  A f(p,cr2),  t£  1NS  (2.7) 

The  process  by  which  the  input  signal  S  comes  about  is  represented  pictorially  in 
Figure  1.  We  are  now  in  a  position  to  clearly  state  the  problem  which  we  would  like  to 
solve. 

Problem  (P):  Given  the  gaussian  noise  corrupted  signal  s,  along  with  a  linearly  inde¬ 
pendent  set,  B,  of  basis  functions  for  s,  recover  the  vector,  a,  of  weights. 

3  Overview  of  the  System 

To  solve  the  problem  (P)  above,  we  propose  a  cascaded  three  block  architecture.  As 
depicted  in  Figure  2,  the  data  flows  from  left  to  right  and  is  processed  serially  by  each 
of  the  three  main  components  of  the  system.  The  first  block  consists  of  a  MEDN  to 
perform  the  task  of  noise  reduction.  Spectrum  analysis  of  the  output  of  the  first  block  is 
performed  in  the  next  block  by  a  second  analog  neural  network.  Finally,  decomposition 
of  the  aggregate  signal  is  performed  in  the  third  block  by  yet  another  analog  neural 
network.  This  final  classification  is  based  on  the  output  spectrum  of  the  previous  block. 

Problem  (P)  comes  about  mainly  from  the  physical  realities  of  the  radar  application 
motivating  this  paper.  As  such,  it  would  be  instructive  to  qualitatively  discuss  and  jus¬ 
tify  the  nature  of  the  proposed  system.  Presented  to  the  system  is  the  noise  corrupted 
superposition  of  radar  pulses  as  displayed  in  Figure  2.  The  objective  of  the  system  is 
to  decompose  this  noise  corrupted  superposition  into  its  underlying  components  or  basis 
functions.  Each  basis  function  may  be  associated  with  a  particular  emitter.  Since  the 
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Figure  1:  Schematic  of  the  Radar  Decomposition  System 
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presented  signal  is  corrupted  by  noise,  it  is  appropriate  to  attempt  to  reduce  that  noise 
level  in  the  hope  that  more  reliable  data  will  facilitate  the  ultimate  goal  of  decompo¬ 
sition.  Hence,  the  MEDN  noise  reducer  preprocessing  is  justified.  Transformation  of 
the  signal  into  the  frequency  domain  is  warranted  by  the  combination  of  facts  that  (i) 
the  system  should  be  insensitive  to  the  relative  phases  of  the  basis  components  and  (ii) 
most  of  the  important  information  is  contained  in  lower  order  frequencies.  Therefore, 
only  the  low  order  components  of  a  frequency  transform  such  as  the  cosine  transform 
are  required  for  the  subsequent  classification.  These  lower  order  terms  are  then  fed  into 
the  final  stage  of  the  system,  the  classifier.  The  classifier  is  configured  as  a  feed  forward 
neural  network  whose  weights  are  set  by  the  well  known  back-propagation  algorithm. 
The  implementation  of  the  classifier  as  a  feed-forward  neural  network  is  supported  by 
empirical  observation  that  such  networks  exhibit  an  exceptional  performance  when  cast 
with  the  task  of  pattern  association.  See  for  example  [1]. 

3.1  Noise  Reduction 

The  MEDN  noise  reducer  is  described  in  detail  in  the  next  section.  A  summary  of  its 
operation  is  as  follows:  A  noise  corrupted  signal,  s(-),  is  input  to  the  noise  reducer.  This 
corrupted  signal  is  then  convolved  with  a  known  window  function,  g(-).  Equation  4.1 

(see  next  section)  is  then  implemented  to  recover  the  noise  reduced  signal,  sat_r(-)  =  x\. 

3.2  Frequency  Transform 

Signals  processed  in  a  real  system  necessarily  belong  to  the  class  of  finite  energy 
signals.  If  #(•)  is  a  discrete  time  real  signal  this  property  can  be  described  mathematically 
as  x  €  £2,  where 


(  A  °° 

l2  =i  ®:  ||*||*  =  J2  x2(k)<oo  l.  (3.1) 

^  k— — 00  ' 

In  fact,  the  signals  in  £2  represent  a  superset  of  those  that  would  be  found  in  a  real 
system  since  signals  in  a  real  system  must  be  of  finite  length.  Hence,  it  is  clear  that  the 
type  of  signals  we  would  like  to  analyze  in  a  physical  discrete  system  are  finite  length 
sequences.  We  will  refer  to  this  set  of  signals  as  (£2)f-  Furthermore  we  would  like  to 
restrict  ourselves  to  causal  signals,  i.e.,  those  signals  who  are  identically  zero  for  all 
negative  time.  We  denote  this  set,  the  projection  of  the  space  (£2)j  onto  the  space  of  all 
causal  signals,  as  C((£2)j)-  These  sets  are  defined  precisely  below  as 

(/2)y  =  {*G4:3iVelN3  x(k)  =  0  VA:  3  \k\  >  N  j 

C((l2)f)  =  |  *  G  f2  :  31V  G  IN  3  x(k)  =  0  V&  G  IN  \  {0, 1,  •  •  • ,  A  —  1}  | 

and 

C((£2)f)  C  (t2)f  C  £2. 

3.2.1  The  Cosine  Transform 

An  appropriate  Fourier  transform  for  signals  in  £2,  and  hence  (£2)f,  is  the  Discrete 
Time  Fourier  Transform  (DTFT).  Assuming  that  the  discrete  signal,  x  G  £2,  is  generated 


(3.2) 

(3.3) 

(3.4) 
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by  sampling  an  analog  signal  where  the  sampling  period  is  given  as  T  6  1R,  the  continuous 
parameter  of  the  DTFT  is  defined  as  6  —  u>T.  Here,  u  represents  frequency. 

If,  however,  only  signals  in  C((i 2) y)  are  of  interest,  the  simple  Cosine  Transform  may 
be  employed.  The  continuous  parameter  of  the  Cosine  Transform,  0,  is  defined  as  above 
in  the  case  of  the  DTFT.  A  Cosine  Transform  has  the  desirable  property  that  it  is  a 
mapping  whose  range  is  purely  real.  We  will  see  that  this  property  greatly  facilitates 
classification.  The  Cosine  Transform  is  defined  below. 


Property 

Description 

1. 

2. 

Linearity 

Real  range 

C{aa ?(•)  +  by(-)}  =  aC{x(-)}  +  bC{y{-)} 

K{C{x(-)}}  =  1R 

Table  1:  Cosine  Transform  Properties 


Suppose  x  £  C'((^2)/)1  then  the  Cosine  Transform  of  ar(-),  denoted  C{a:(-)},  is  given 
as 


a  N~1 

cm-)}  =  r  x(k)cos(k6),  (3.5) 

k—0 

where  N  is  the  finite  length  of  the  signal  x(-).  Some  important  properties  of  the  Cosine 
Transform  are  presented  below  in  Table  1.  Let  x,y  G  £'((£2)/),  and  a,  6  6  El.  Property 
two  in  Table  1  suggests  that  the  representation  of  signals  in  C((£2)y)  is  greatly  compacted 
from  those  in  £2-  Since  the  DTFT  has  complex  range  complete  knowledge  of  both  the 
phase  and  the  magnitude  of  the  transform  is  required;  however,  the  Cosine  Transform 
requires  only  knowledge  of  which  half  plane  it  is  that  the  phase  resides. 


3.2.2  Computation  of  the  Cosine  Transform 

The  objective  of  real-time  performance  requires  that  each  of  the  three  components  of 
the  proposed  system  operate  at  comparable  and  fast  speeds.  At  the  rates  of  convergence 
(around  a  microsecond)  for  the  MEDN  and  classifier  networks,  conventional  frequency 
transformation  methods  are  inadequate.  For  this  reason,  we  again  look  to  an  analog 
approach  [2]. 

Especially  tractable  for  such  an  approach  is  the  Discrete  Hartley  Transform  (DHT) 
[3].  Defined  in  Equation  3.6,  the  DHT  can  be  formulated  as  the  matrix  multiplication  in 
3.7.  Let  x  €  T5lN  representing  a  discrete  signal  of  length  N  and  define  the  cas  function 

as  cas(f)  =  sin(f)  +  cos (t).  Then,  the  DHT  of  x,  denoted  Xh(6)  is 

XH (0)  =  jz  J2  x(fc)  cas(fc60>  0  =  —  (3.6) 

A;=0 
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OR 


X  =  D_1x  (3.7) 

where  D  =  [Dij\  and  Dij  =  cas(^2-).  Among  the  many  ’nice’  properties  of  the  matrix 
D  are  the  facts  that  first,  its  inverse  is  given  simply  as  D~l  =  jjD,  and  second,  it  has 
condition  number  equal  to  one. 

We  would  now  like  to  relate  the  DHT  and  the  Cosine  Transform.  For  notational 
convenience  let  the  Cosine  Transform  be  denoted  as  Xc(0)  =  C{x(-)}.  Noting  that  the 
cas  function  obeys  the  property 

cas(0)  +  cas(— 0)  =  2  cos(0),  (3-8) 

it  is  easy  to  see  that  the  even  part  of  the  Hartley  transform,  £{Xh(&)},  gives  the  Cosine 
Transform.  That  is 

£{Xh{0)}  =  \{Xh(9)  +  Xh{-9)}  =  Xc(0).  (3.9) 


3.3  Classification 

Classification  as  used  here  has  the  specific  meaning  of  determining  the  vector,  a,  of 
weights  that  solves  the  problem  (P).  Although,  classification  in  the  sense  of  identifying 
an  arbitrary  parameter  vector,  0,  for  some  input  signal  can  be  achieved  to  any  desired 
accuracy  by  solving  the  problem  (P).  This  is  accomplished  simply  by  partitioning  the 
parameter  space  to  the  desired  resolution.  A  pictorial  representation  of  a  possible  par¬ 
titioning  is  presented  in  Figure  3  for  the  case  of  a  two  dimensional  parameter  space, 
0  =  (0i,02).  Such  a  scheme,  however,  has  the  drawback  that  it  dramatically  increases 
the  number  of  basis  functions  in  the  basis  set  B. 

Proposed  for  the  classification  is  a  feed  forward  three-layer  pattern  association  neural 
network.  Hence,  the  network  consists  of  an  input  layer,  an  intermediate  hidden  layer  and 
finally  an  output  layer  (see  Figure  4).  Trained  in  a  supervisory  manner,  the  associator 
network  is  presented  with  a  training  set  of  input/output  patterns.  Here  the  input  pattern 
is  a  linear  combination  of  the  frequency  transformed  basis  functions  and  the  output 
pattern  is  the  corresponding  vector  of  weights.  Training  of  the  network  is  facilitated 
through  the  well  known  back-propagation  algorithm  [4], 

Such  neural  topologies  have  been  previously  employed  by  many  authors  for  the  task 
of  pattern  association,  e.g.  [1]  or  [5].  For  instance,  Gorman  and  Sejnowski  [1]  in  the 
classification  of  sonar  signals.  Gorman  and  Sejnowski  essentially  observed  that  as  the 
number  of  hidden  units  was  increased  the  performance  of  the  classification  was  better 
for  a  moderate  (more  than  150)  number  of  presentations  of  the  training  set.  Specifically 
Gorman  and  Sejnowski  obtained  99.8  percent  classification  accuracy  with  the  ratio  of 
hidden  units  to  input  units  equal  to  §.  For  this  reason  the  classifier  was  chosen  to  have 
the  same  number  of  hidden  units  as  input  units,  i.e.  a  ratio  of  hidden  units  to  input 
units  equal  to  1.  The  number  of  output  units  is  simply  the  number  of  basis  functions, 
N,  in  the  basis  set  B.  Each  level  of  activation  of  the  output  units  then  represents  the 
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classifier’s  estimation  of  the  weight.  For  ease  of  discussion  the  term  ’network’  will  now  be 
used  in  reference  to  this  three  layer  topology  just  discussed  unless  otherwise  indicated. 

Simulations  indicate  that  when  the  trained  network  is  presented  with  an  input  outside 
the  training  set  the  result  is  linear  interpolated,  i.e.  the  network  finds  a  linear  fit  for  the 
data  presented.  Because  of  this  linear  interpolation  property  it  is  necessary  that  the 
transformation  process  be  linear.  Otherwise,  classification  of  signals  not  in  the  training 
set  will  be  unacceptable.  Although  the  DTFT  is  linear,  it  has  the  drawback  that  its 
range  is  complex;  and  hence,  the  representation  of  one  sample  of  the  DTFT  requires  two 
real  numbers.  The  Cosine  Transform,  which  has  purely  real  range,  averts  this  drawback 
nicely. 

4  The  Deconvolution  Network 


Important  to  the  operation  of  the  system  is  the  analog  maximum  entropy  deconvolu¬ 
tion  network  (MEDN)  suggested  by  Marrian  and  Peckerar  [6],  As  its  name  suggests,  this 
network’s  primary  function  is  the  task  of  deconvolving  or  deblurring  a  signal  assumed 
previously  convolved  through  some  (perhaps  physical)  process.  Because  of  this,  it  is  not 
at  all  obvious  why  such  a  scheme  is  relevant  to  the  solution  of  the  problem  discussed 
in  Section  1.  We  defer  motivation  of  the  incorporation  of  the  MEDN  in  the  proposed 
system  in  lieu  of  a  brief  overview  of  the  operation  of  the  MEDN. 

4.1  Operation  of  the  MEDN 

A  detailed  analysis  of  the  MEDN  is  treated  by  Pati  et.  al.  in  [7],  Deconvolution  or 
deblurring  of  signals  in  the  presence  of  noise  is  in  general  ill-posed.  Regularization  is  a 
technique  to  solve  ill-posed  problems  in  which  a  priori  knowledge  of  the  solution  space 
is  introduced  into  the  solutions  via  a  functional  to  be  minimized  (Poggio  and  Koch  [8]). 
Letting  x\  denote  the  regularized  solution  to  the  deconvolution  problem,  we  have, 

X\  —  argmin  {#»  -  Axf  +  XP(x)}  (4.1) 

Where  it  is  assumed  that  the  data  has  undergone  a  transformation  of  the  form  y  =  Ax  +  e, 
where  A  is  a  matrix  representing  the  discretized  convolution  kernel,  e  is  a  n-vector  of  noise 
components  and  x  and  y  are  both  elements  of  n  dimensional  space. 

In  the  particular  case  of  the  MEDN,  the  use  of  exponential  amplifiers  at  the  output 
nodes  results  in  the  introduction  of  a  regularizing  principle  which  is  the  negative  of  the 
Shannon  entropy  of  the  solution  i.e.  P(x)  =  J2ix^°9(xi)-  The  constraints  imposed  by 
the  entropy  regularizer  are  smoothness  and  non-negativity  of  the  solution.  Hence,  the 
entropy  regularization  ’’smooths”  the  solution  resulting  in  reduction  of  the  noise,  i.e.  an 
improvement  in  the  signal  to  noise  ratio. 

By  suitably  adjusting  the  regularization  weight  A,  an  arbitrary  degree  of  smoothing 
can  be  achieved.  However,  if  A  is  allowed  to  be  too  large  the  resulting  minimum  will 
hardly  approximate  the  true  solution  x.  Therefore,  there  exists  some  optimal  value  for 
A.  Then,  how  should  the  parameter  A  be  chosen?  Given  that  we  would  like  the  MEDN 
solution  to  minimize  the  mean  square  error  (MSE),  defined  as 

MSE  =  \\(xx-x)\\2,  (4.2) 
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the  use  of  optimization  based  design  and  the  application  of  computer  optimization  soft¬ 
ware  such  as  CONSOLE  (Fan  et.  al  [9])  seems  appropriate. 

4.2  Noise  Reduction:  Why  the  MEDN? 

In  the  specific  case  of  the  radar  return  data  as  described  earlier  there  is  no  convolution 
present  in  the  noise  model  (i.e.  the  convolution  kernel  A  =  I).  In  fact,  in  general  there 
is  no  reason  to  associate  the  operations  of  convolution  and  noise  reduction  in  any  way. 
This  prompts  the  question  of  the  appropriateness  of  a  deconvolution  network  for  the  task 
of  noise  reduction. 

Measurement  of  the  noise  reduction  is  based  on  a  squared  error  sum  methodology. 
Knowledge  of  the  uncorrupted  signal,  s ,  is  assumed  for  the  computation.  The  magnitude 
of  the  degradation  of  the  signal  is  then  measured  in  deterministic  terms.  Given  the 
degraded  signal  s,  the  degradation,  (/(.?),  of  the  signal  is 

d(s)  =  -  s(t))2.  (4.3) 

t= o 

Note  that  the  relationship  between  the  signal  to  noise  ratio  and  d(-)  is  on  the  average 
a  monotonically  increasing  one.  That  is  to  say  that  on  average  the  larger  the  signal  to 
noise  ratio,  the  larger  d(-)  will  be  for  a  particular  realization  of  s(-)  through  noise. 

Noise  reduction  with  respect  to  the  MEDN  is  then  simply  given  as  the  ratio  of  the 
noise  degradation  functions  of  the  signals  present  at  the  output  and  input  of  the  MEDN 
respectively.  If  Sinput  and  soutput  represent  the  input  and  output  of  the  MEDN,  the  noise 
reduction,  NR,  is  given  as 


NR  = 


d(s  output) 
d{Sinput) 


Clearly  a  value  for  NR  <  1  is  required  of  the  MEDN. 


(4.4) 


It  is  obvious,  as  alluded  to  above,  that  deconvolution  per  se  is  unwarranted  in  the 
problem  (P).  What  is  of  importance,  however,  is  the  regularizing  principle  employed 
by  the  MEDN.  Through  exploitation  of  the  regularizing  properties  of  the  network,  it  is 
possible  to  achieve  the  desired  end  of  noise  reduction.  This  claim  is  supported  through 
the  observations  that  the  MEDN 

1.  exhibits  non-linear  reduction  in  the  noise  (see  Figure  3) 

2.  is  demonstrably  better  than  simple  low  pass  (see  Figure  5). 


4.3  Implementation  of  the  MEDN  Noise  Reducer 

Since  there  is  no  convolution  present  in  the  signal,  ?(•),  it  is  necessary  to  convolve 
the  noisy  data  with  a  non-trivial  convolution  kernel.  The  reason  for  this  is  that  if  the 
convolution  kernel  were  trivial,  the  MEDN  would  become  decoupled.  That  is,  the  solution 
given  by  the  vector  equation  4.1  for  a  given  element  would  become  independent  of  the 
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l  eqend: 

_ reconstruction 

.  additive  noise 

-----  tow  pass 


Figure  5:  Noise  reduction  using  MEDN  as  compared  to  simple  low  pass 


value  of  its  neighboring  elements.  Or  equivalently,  the  vector  equation  4.1  becomes  n 
independent  scalar  equations  and  there  is  no  hope  of  smoothing  the  data.  To  avert 
this  problem,  the  incoming  data  is  preconvolved  with  a  non-trivial  convolution  kernel  or 
window  function,  denoted  by  <?(•)  and  given  in  Equation  4.5. 


g(t)  = 


( 


h, 

0, 


if  |*|  <  Nk 
otherwise 


(4.5) 


Thus,  the  pulse  width  of  the  kernel,  2 Nk  —  1,  is  an  added  dimension  to  the  space  of 
parameters  over  which  the  minimization  of  4.2  will  range. 


5  System  Simulation 


Extensive  computer  simulations  of  the  proposed  system  have  been  performed.  As  a 
test  case,  the  value  TV  =  2  for  the  number  of  basis  functions  in  the  basis  set,  B  was  chosen 
in  the  simulations.  Presented  in  Table  2  below  are  other  choices  for  system  parameters. 

The  first  block  of  the  proposed  architecture,  the  MEDN  noise  reducer,  has  been  simu¬ 
lated  using  the  interactive  simulation  package  SIMNON  (Lund  Institute  Of  Technology). 
Appendix  A  contains  typical  simulation  results.  The  dash  lined  functions  represent  the 
respective  noise  functions  in  both  the  Squared  Error,  and  Noisy  Reconstruction  graphs 
and  the  solid  line  represents  the  spectrum  of  the  uncorrupted  data  in  the  Transform 
graph.  Parameter  nn  in  the  figures  indicates  the  standard  deviation  of  the  noise  which 
is  added  to  the  normalized  input  signal.  This  value  of  nn  is  obtained  by  multiplying 
the  standard  deviation  by  a  factor  of  100  followed  by  truncation,  i.e.,  nn  =  mt(100<r). 
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Symbol 

Parameter 

Value 

Convolver 

h 

kernel  height 

1 

MEDN 

R 

Resistance 

10  n 

C 

Capacitance 

10  fiF 

S 

Feedback  Gain 

10 

Nk 

Kernel  Width 

5 

Classifier 

rii 

Number  of  input  units 

11 

nh 

Number  of  hidden  units 

11 

nQ 

Number  of  output  units 

2 

Table  2:  System  Parameters 


Inspection  of  the  Squared  Error  graphs  indicates  that  substantial  noise  reduction  has 
been  achieved  as  defined  by  equation  4.4. 

As  discussed  earlier,  the  transform  stage  of  the  system  is  implemented  as  a  sim¬ 
ple  Cosine  Transform.  This  simply  entails  evaluating  equation  3.5  for  9  =  ^k,  k  = 
0, 1, . .  .,rii  —  1  (=10)  where  n  is  the  length  of  the  input  signal.  Typical  transforms  can 
be  seen  in  Appendix  A. 


Simulation  of  the  back-propagation  classification  network  is  supported  by  the  neural 
network  software  package  written  by  Rumelhart  and  McClelland  [4].  Normalization  of 
the  elements  of  the  training  set  greatly  reduces  its  size  since  only  parameter  vectors 
whose  elements  add  to  one  need  be  included.  This  effectively  reduces  the  output  space 
to  a  dimension  one  less  than  the  original  space.  For  the  simulations  in  Appendix  A,  the 
network  was  trained  on  a  set  of  only  size  ten. 

A  summary  of  the  performance  of  the  simulation  of  the  system  is  presented  in  Table 
3.  The  table  consists  of  five  sets  of  runs;  each  run  is  associated  with  a  different  signal  to 
noise  ratio,  S/N.  Each  run  set  consists  of  five  sets  of  parameters  from  the  training  set  and 
five  outside  the  training  set.  The  values  <*i  and  a2  represent  the  respective  weights  of  the 
basis  functions  composing  the  deterministic  portion  of  the  input  signal.  Classification 
results  are  displayed  in  the  following  two  columns  as  aL  and  a2  corresponding  to  the  two 
respective  estimates  of  the  basis  weights,  a\  and  o2.  The  final  four  columns  of  the  table 
present  various  percent  error  statistics:  71  and  72  are  the  respective  percent  error  in  the 
estimates  of  and  a2,  7  is  the  average  percent  error  estimate  for  both  a\  and  a2,  and 
finally  77  is  the  total  average  error  estimate  for  all  runs  in  the  test  set. 

This  simulation  data  seems  to  suggest  an  inverse  relationship  between  the  average 
total  error,  77  and  the  signal  to  noise  ratio,  S/N.  That  is  the  average  total  error  is  a 
monotonically  decreasing  function  of  the  signal  to  noise  ratio.  Moreover,  we  have  fitted 
a  curve  of  the  form 

/(*)  =  b  +  e^~a^  ,  x  G  [0, 00)  (5.1) 
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S/N 

«i 

«2 

«i 

a2 

7i 

72 

7 

0.2 

0.8 

0.15 

0.85 

26.96 

6.72 

16.84 

0.4 

0.6 

0.30 

0.70 

23.99 

15.96 

19.97 

0.5 

0.5 

0.42 

0.58 

16.23 

16.22 

16.23 

0.6 

0.4 

0.54 

0.46 

9.53 

14.31 

11.92 

0.8 

0.2 

0.76 

0.24 

5.13 

20.61 

12.87 

0.0  dB 

0.1 

0.5 

0.08 

0.52 

22.62 

4.51 

13.57 

14.43 

0.4 

0.5 

0.32 

0.58 

20.63 

16.48 

18.55 

0.5 

0.3 

0.46 

0.34 

8.27 

13.82 

11.05 

0.8 

0.4 

0.75 

0.45 

6.60 

13.25 

9.93 

0.9 

0.7 

0.79 

0.81 

11.73 

15.09 

13.41 

0.2 

0.16 

21.20 

5.25 

13.23 

0.4 

HI 

0.32 

in 

19.89 

16.54 

0.5 

0.5 

0.44 

0.56 

12.77 

12.77 

12.77 

0.6 

0.4 

0.56 

0.44 

6.90 

8.65 

0.8 

0.2 

0.77 

0.23 

3.87 

15.59 

9.73 

1.0  dB 

0.1 

0.5 

0.08 

0.52 

16.27 

3.21 

9.74 

11.04 

0.4 

0.5 

0.33 

0.57 

16.83 

13.42 

15.13 

0.5 

0.3 

0.47 

0.33 

5.82 

9.76 

7.79 

0.8 

0.4 

0.76 

0.44 

4.44 

8.97 

6.71 

0.9 

0.7 

0.82 

0.78 

8.81 

11.36 

10.09 

0.2 

0.8 

0.17 

Ijpa 

16.52 

10.30 

0.4 

0.6 

0.33 

16.71 

Rg!S 

13.90 

0.5 

0.5 

0.45 

0.55 

10.13 

■T)]|Rl 

10.13 

0.6 

0.4 

0.57 

0.43 

4.84 

7.34 

6.09 

0.8 

0.2 

0.78 

0.22 

2.92 

11.79 

7.36 

2.0  dB 

0.1 

0.5 

0.09 

0.51 

11.02 

2.15 

6.59 

8.38 

0.4 

0.5 

0.34 

0.56 

13.88 

11.07 

12.48 

0.5 

0.3 

0.48 

0.32 

3.92 

6.61 

5.27 

0.8 

0.4 

0.78 

0.42 

2.78 

5.67 

4.22 

0.9 

0.7 

0.84 

0.76 

6.54 

8.46 

7.50 

0.2 

0.8 

0.17 

0.83 

12.78 

3.14 

7.96 

0.4 

0.6 

0.34 

0.66 

14.15 

9.38 

11.77 

0.5 

0.5 

0.46 

0.54 

8.03 

8.05 

8.04 

0.6 

0.4 

0.58 

0.42 

3.24 

4.95 

4.09 

0.8 

0.2 

0.78 

0.22 

2.19 

8.89 

5.54 

3.0  dB 

0.1 

0.5 

Em 

0.51 

6.86 

1.32 

4.09 

6.29 

0.4 

0.5 

m 

11.52 

9.19 

10.36 

0.5 

0.3 

0.49 

0.31 

2.43 

4.16 

3.29 

0.8 

0.4 

mi 

0.41 

1.49 

3.10 

2.30 

0.9 

0.7 

Hi 

0.74 

4.79 

6.22 

5.51 

0.2 

ESI 

0.21 

2.57 

0.69 

0.4 

m 

0.38 

4.29 

2.85 

Wk’mm 

0.5 

0.5 

0.50 

Ini 

0.34 

0.39 

0.37 

0.6 

0.4 

0.61 

0.39 

2.43 

3.53 

2.98 

0.8 

0.2 

0.79 

0.21 

0.91 

3.49 

2.20 

oo  dB 

0.1 

0.5 

0.11 

0.49 

10.40 

2.12 

6.26 

2.93 

0.4 

0.5 

0.39 

0.51 

2.56 

2.06 

2.31 

0.5 

0.3 

0.51 

0.29 

2.81 

4.53 

3.67 

0.8 

0.4 

0.82 

0.38 

2.98 

5.79 

4.39 

0.9 

0.7 

0.91 

0.69 

1.60 

1.96 

1.78 

Table  3:  Simulation  Summary 
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Signal  to  Noise  (dB) 

Figure  6:  The  average  total  error  of  parameter  estimation  as  a  function  of  the  signal  to 
noise  ratio. 

to  the  experimental  data.  Figure  6  displays  the  experimental  and  fitted  (analytical) 
curves  obtained.  The  values  obtained  for  the  parameters  are  b  —  2.93,//  =  2.44,  and 
a  =  0.372. 


6  VLSI  Implementation 

In  order  to  realize  the  full  computational  power  of  neural  networks  it  is  important  to 
consider  implementation  of  neural  networks  as  analog  electrical  circuits.  Although  the 
current  state  of  VLSI  technology  does  not  allow  for  implementation  of  extremely  large 
networks,  there  is  still  a  class  of  useful  networks  that  can  be  implemented.  The  networks 
we  consider  here  are  among  those  for  which  currently  implementable  sizes  are  sufficient 
to  prove  useful.  As  discussed  earlier,  foremost  amongst  our  objectives,  is  the  design 
and  implementation  of  a  system  suitable  for  practical  applications.  Essential  to  the 
practicality  of  such  a  system  are: 

1.  The  ability  to  perform  the  desired  decomposition  and  classification  within  the  con¬ 
straints  of  a  demanding  real-time  environment. 

2.  Physical  compactness  and  modest  power  consumption,  so  as  to  satisfy  constraints 
of  the  physical  environment  (e.g.  in  missile  guidance  applications). 

In  this  section  we  discuss  analog  VLSI  implementation  of  the  proposed  system  for 
radar  signal  decomposition  and  classification.  As  we  shall  show,  analog  VLSI  implemen¬ 
tation  results  in  a  system  capable  of  satisfying  the  desired  performance  criteria. 
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Figure  7:  Network  to  implement  discretized  analog  convolution 


6.1  The  Convolution  Subnetwork 


Given  a  vector  x  £  1R"  of  sampled  data,  convolution  with  a  given  convolution  kernel 
is  described  by  the  system  of  equations,  y  =  Tx,  where  y,  x  £  Htn,  and  T  £  !R,nX” 
is  the  matrix  representation  of  the  discretized  convolution  kernel.  Convolution  is  easily 
implemented  in  analog  circuitry  by  the  network  shown  in  Figure  7.  Inputs  to  the  network 
in  Figure  7  are  the  voltages  xi,...,xn.  Connections  within  the  network  are  made  by 
resistors  with  values  Rij  =  l/f,j.  Thus  the  contribution  to  the  current  at  the  jth  output 
yj  due  to  X{  is  given  by  Ohm’s  Law  to  be  Xi/Rij.  Hence  the  output  currents  are  given 

by  N  N 

Vj  =  jr  =  ^  t'iXi  j  =  (6.1) 

«=1  «=1 

Which  implements  the  desired  convolution. 


For  the  particular  ‘boxcar’  convolution  kernel  given  by  Equation  4.5,  the  matrix 
T  =  [ttJ]  is  a  band-diagonal  matrix  given  by, 


t{j 


if  |*  -  j\  <  (Nk-  l)/2 
otherwise 


(6.2) 


Thus  the  number  of  nonzero  connections  required  is  n  +  2  Yli=i  *^2(n  -  i ).  Fabrica¬ 
tion  of  the  required  network  interconnections  is  thus  facilitated  by  choice  of  the  boxcar 
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Figure  8:  The  Maximum  Entropy  Deconvolution  Network  (MEDN) 
convolution  kernel,  since 

1.  only  a  small  number  of  connections  are  required  in  comparison  to  the  number  of 
possible  connections  (n2)  for  an  arbitrary  choice  of  the  convolution  kernel,  and 

2.  since  the  weights  (resistance  values)  for  the  connections  are  all  identical,  errors  due 
to  processing  induced  variations  in  the  circuit  are  minimized. 

6.2  The  Deconvolution  Subnetwork 

As  discussed  earlier,  regularized  deconvolution  is  performed  by  the  maximum  entropy 
deconvolution  network  (MEDN)  shown  schematically  in  Figure  8.  Inputs  to  the  MEDN 
(j/i, . .  -  ,yn)  are  currents  which  are  provided  by  the  outputs  of  the  convolution  network 
just  described.  An  initial  attempt  at  integrated  circuit  implementation  of  the  deconvo¬ 
lution  network  has  been  completed,  but  remains  untested.  In  addition,  two  prototype 
breadboard  networks  have  been  constructed  and  have  been  used  to  study  performance 
of  the  deconvolution  network  in  terms  of  speed  of  convergence  and  accuracy  of  solutions. 
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Accurate  assessments  of  convergence  time  for  the  network  are  not  easily  made  using 
digital  computer  simulations.  Also,  in  analysis  of  the  deconvolution  network  (see  [7])  it 
was  assumed  that  any  dynamics  associated  with  the  constraint  plane  could  be  ignored 
provided  that  the  signal  plane  amplifiers  are  sufficiently  slower  in  response.  In  practical 
implementations  of  such  a  network,  it  is  necessary  to  understand  what  effects  delays  in 
the  constraint  plane  response  may  have  upon  the  network.  It  is  ultimately  the  constraint 
plane  dynamics  which  limit  the  speed  of  convergence  which  is  achievable.  A  formal 
treatment  of  this  subject  is  to  be  found  in  Marcus  and  Westervelt  [10]. 

6.2.1  Breadboard  Prototype  Deconvolution  Network 

A  prototype  breadboard  model  of  the  deconvolution  network  was  constructed  using  ‘off- 
the-shelf’  operational  amplifiers,  resistors  and  capacitors.  The  network  was  constructed 
with  seven  signal  plane  nodes  and  seven  constraint  plane  nodes.  Since  the  purpose  of 
constructing  the  breadboard  prototype  was  to  estimate  the  speed  of  convergence  achiev¬ 
able  by  such  a  network,  exponential  amplifiers  in  the  signal  plane  were  replaced  by  unity 
gain  linear  amplifiers  to  simplify  the  circuit  2.  Replacing  the  exponential  amplifiers  by 
linear  amplifiers  results  in  the  entropy  regularizer  being  replaced  by  a  regularizer  of  the 
form, 

=]?£»?■  (6-3) 

i 

In  Figure  9  a  circuit  diagram  of  a  single  signal  plane  node  is  shown.  Each  signal  plane 
node  consists  of  two  stages  of  amplification.  Associated  with  the  first  stage  is  the  feedback 
capacitor  C,  which  introduces  the  relevant  network  dynamics,  and  the  feedback  resistor 
R,  which  introduces  and  weights  (A  =  1  / R)  the  regularizing  term  in  the  energy  function. 
The  second  stage  of  each  signal  plane  node  is  configured  as  an  analog  inverter.  Negative 
connection  weights  are  implemented  simply  by  using  the  inverted  output  of  the  node. 
Although  negative  connection  weights  are  not  required  in  the  particular  application  we 
consider  here,  they  were  necessary  for  the  tactile  sensing  application  (see  [7])  in  which 
the  network  was  tested.  Figure  10  shows  a  single  constraint  plane  node.  Each  constraint 
plane  node  also  consists  of  two  stages.  The  first  stage  is  configured  as  a  virtual  ground 
transimpedance  amplifier,  which  provides  the  feedback  gain  for  signals  fed  back  to  the 
signal  plane.  As  in  the  signal  plane,  the  second  stage  of  each  constraint  plane  node  is  an 
analog  inverter. 

In  Figure  11  photographs  of  the  7-Channel  breadboard  prototype  of  the  deconvolution 
net  work  and  the  experimental  setup  used  for  testing  it  are  shown.  Input  currents  to 
the  network  are  clocked  using  a  1  kHz  relaxation  oscillator  so  as  observe  transients 
(as  the  network  evolves)  using  an  oscilloscope.  Outputs  of  the  network  are  captured 
by  a  MetaResearch  data  acquisition  board  used  in  conjunction  with  a  Macintosh  Plus 
computer  and  the  resultant  reconstruction  is  plotted  on  the  Macintosh  display. 

The  rise  time  of  the  constraint  plane  amplifiers  was  measured  to  be  approximately  1 
/tsec.  Actual  response  time  of  the  constraint  plane  would  be  longer  than  this  since  the 
parallel  combination  of  all  resistors  connected  to  the  input  of  any  node  contribute  to  the 
RC  time  constant.  It  was  observed  that  for  choices  of  the  signal  plane  capacitors  C  for 

2  A  second  breadboard  prototype  was  also  constructed  which  contained  the  exponential  amplifiers,  but 
was  used  to  solve  a  different  problem  (see  [11]). 
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Figure  9:  Schematic  circuit  diagram  of  a  single  signal  plane  node 


Figure  10:  Schematic  circuit  diagram  of  a  single  constraint  plane  node 
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Figure  11:  Top:  Photograph  of  7-Channel  breadboard  prototype  of  deconvolution  network 
Bottom:  Photograph  of  experimental  setup  used  to  test  the  deconvolution  network. 
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which  the  rise  time  of  the  outputs  of  the  network  would  be  below  10  fisec,  the  outputs 
would  oscillate  i.e.  the  network  was  unstable.  For  (7=10  pF  the  rise  time  of  the  outputs 
of  the  network  was  measured  to  be  10  fisec  (see  Figure  12).  It  is  clear  that  the  use  of 
faster  operational  amplifiers  would  result  in  an  increase  in  achievable  speed  since  this 
would  decrease  the  constraint  plane  response  time  and  thereby  permit  a  decrease  in  the 
time  constant  of  the  signal  plane. 

Settling  time  and  overshoot  of  the  outputs  of  the  network  are  controlled  by  the  gain 
of  the  constraint  plane  nodes.  CONSOLE,  a  CAD  tool  for  parametric  optimization  of 
dynamical  systems,  (see  [9])  was  used  to  choose  a  value  for  the  gain  so  as  to  minimize 
overshoot  and  settling  time. 

6.2.2  Integrated  Circuit  Prototype  of  Deconvolution  Network 

A  prototype  analog  integrated  circuit  implementation  of  the  deconvolution  network  de¬ 
scribed  here  has  been  fabricated,  but  remains  to  be  tested.  A  hierarchical  design  phi¬ 
losophy  is  practiced  in  this  initial  implementation.  The  deconvolution  network  may  be 
thought  of  as  composed  of  two  sections:  (i)  Active  components  of  the  network  including 
signal  and  constraint  plane  amplifiers  and  (ii)  The  functionally  passive3  resistive  inter¬ 
connection  matrix.  These  two  sections  may  also  be  thought  of  in  the  following  manner. 
Once  the  size  of  the  deconvolution  network  (number  of  inputs  and  outputs)  has  been 
decided,  the  amplifiers  of  the  network  are  determined.  However,  the  resistive  matrix 
may  be  a  variable  entity.  Thus  the  network  may  be  thought  of  as  being  composed  of  a 
fixed  part  and  a  variable  part. 

If  fixed  resistors  are  to  be  used  to  implement  the  interconnect  matrix,  then  some 
provision  should  be  made  to  change  this  matrix  without  having  to  refabricate  the  rest  of 
the  network.  In  order  to  provide  some  flexibility  in  the  choice  of  the  interconnect  matrix 
and  to  permit  the  use  of  two  different  fabrication  technologies,  the  deconvolution  network 
was  fabricated  as  two  separate  integrated  circuits. 

The  amplifier  chip  is  designed  to  serve  as  the  ‘motherboard’  for  the  network  on  top 
of  which  the  resistive  connection  matrix  chip  is  placed  (see  Figure  13).  Connections 
between  the  two  chips  are  made  by  local  wire  bonds  between  bonding  pads  provided  for 
this  purpose  on  both  chips.  This  approach  also  facilitates  experimentation  with  different 
types  of  connection  matrices  such  as  those  with  programmable  connections. 

The  Amplifier  Chip  Figure  14  shows  the  layout  of  the  amplifier  section  of  the  decon¬ 
volution  network.  This  chip  provides  the  signal  plane  and  constraint  plane  amplifiers  for 
a  deconvolution  network  with  eleven  input  nodes  and  eleven  output  nodes.  Fabrication 
of  the  amplifier  section  of  the  deconvolution  network  was  undertaken  through  the  MOSIS 
facility,  using  a  two  metal  layer  CMOS  p-well  technology  with  a  minimum  feature  size  of 
3/im.  The  size  of  the  amplifier  chip  is  approximately  6300 fim  x  8800^Lim.  In  the  center 
of  the  amplifier  chip,  a  4000 fim  X  3200/tm  space  has  been  provided  for  placement  of  the 
second  chip  containing  the  resistive  connection  matrix.  It  is  actually  not  necessary  to 

3The  term  ‘functionally  passive’  here  is  used  to  describe  the  fact  that  in  some  situations  it  is  desirable 
to  use  active  circuit  components  configured  to  look  like  a  passive  resistor  from  an  input/output  standpoint. 
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Figure  12:  Photographs  of  oscilloscope  traces  showing,  Top:  time  evolution  of  a  single 
output  of  the  signal  plane,  and  Bottom:  time  evolution  of  a  single  output  of  the  constraint 
plane,  for  the  7-channel  breadboard  prototype,  deconvolution  network. 
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Figure  13:  Resistive  matrix  chip  is  placed  directly  atop  the  amplifier  chip  and  connected 
to  it  using  local  bonding 

leave  this  space  since  the  chip  is  passivated,  but  for  an  initial  implementation,  it  was 
provided. 

A  total  of  forty  four  operational  amplifiers  are  implemented  on  this  chip;  twenty  two 
for  the  signal  plane  and  twenty  two  for  the  constraint  plane.  Each  signal  plane  node  and 
constraint  plane  node  is  identical  in  structure  to  the  signal  and  constraint  plane  nodes 
used  for  the  breadboard  prototype.  Both  the  inverted  and  noninverted  outputs  of  the 
signal  and  constraint  plane  nodes  are  connected  to  bonding  pads  adjacent  to  the  area 
where  the  resistive  matrix  is  to  be  placed.  Thus  provision  is  made  for  the  implementation 
of  positive  and  negative  connection  weights.  Outputs  and  inputs  of  both  the  signal  plane 
nodes  and  constraint  plane  nodes  are  connected  to  bonding  pads  located  on  the  periphery 
of  the  die  to  permit  access  to  these  nodes  after  the  chip  has  been  packaged.  Packaging 
of  the  chip  requires  a  package  with  a  minimum  of  fifty  pins,  if  the  inputs  and  outputs  all 
nodes  are  to  be  accessible. 

The  basic  building  block  of  the  amplifier  chip  is  the  operational  amplifier.  Figure  16 
shows  a  circuit  schematic  of  the  operational  amplifier  designed  for  this  purpose.  Design 
of  the  amplifier  takes  into  consideration  the  current  driving  requirements  for  use  in  the 
deconvolution  network.  Inputs  on  each  amplifier  are  diode  protected  against  spikes. 
A  source  follower  output  stage  is  used  on  every  amplifier  so  as  to  meet  the  current 
driving  requirements  while  maintaining  relatively  low  circuit  complexity  (compared  to, 
for  example  a  push-pull  output  stage). 

As  can  be  seen  from  Figure  16,  an  internal  compensation  capacitor  of  5pf  is  required 
for  every  operational  amplifier.  In  addition  to  this,  external  feedback  capacitors  of  about 
lOpF  are  required  at  the  first  stage  of  every  signal  plane  node.  Since  the  technology 
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Figure  14:  Layout  of  integrated  circuit  chip  containing  all  amplifiers  for  an  eleven  channel 
deconvolution  network 
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Figure  15:  Left:  Layout  of  signal  plane  amplifier;  Right:  Layout  of  constraint  plane 
amplifier. 
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available  did  not  include  an  additional  electrode  layer  (i.e.  a  second  layer  of  polysilicon 
to  be  used  in  forming  capacitors),  all  capacitors  in  the  network  are  formed  as  parallel 
plate  capacitors  with  polysilicon  as  one  electrode  and  the  first  metal  layer  as  the  second 
electrode.  Capacitance  between  the  first  metal  layer  and  polysilicon  is  approximately  a 
factor  of  eight  less  than  the  capacitance  between  polysilicon  and  an  electrode  layer.  Hence 
about  eight  times  as  much  area  is  used  to  form  the  capacitors,  compared  to  capacitors 
formed  using  an  electrode  layer.  Since  about  half  of  the  area  used  by  constraint  plane 
nodes  and  two  thirds  of  the  area  used  by  the  signal  plane  nodes  is  used  to  form  the 
capacitors,  about  seven  twelfths  of  the  total  area  used  by  the  amplifiers  is  taken  up  by 
capacitors.  Hence  the  use  of  a  separate  electrode  layer  would  result  in  about  a  50% 
reduction  in  space  utilization.  Therefore  a  network  with  twenty  two  inputs  and  twenty 
two  outputs  is  a  trivial  extension  of  the  current  structure  (which  includes  the  space 
provided  for  the  resistive  matrix)  using  a  comparable  3 jim  technology. 

The  Resistive  Interconnection  Matrix  Chip  In  any  implementation  of  a  matrix 
of  resistive  elements,  it  is  necessary  to  consider  the  effects  of  process  induced  variations 
on  the  performance  of  the  network  with  which  it  is  to  be  used.  Current  integrated  circuit 
process  technology  may  introduce  variations  in  the  value  of  any  given  resistor  as  large  as 
20%.  It  must  either  be  established  that  degradation  in  performance  of  the  network  due 
to  such  variations  is  irrelevant,  or  an  approach  to  implementation  must  be  taken  which 
preserves  the  essential  characteristics  of  the  interconnection  matrix  in  the  face  of  process 
induced  variations.  The  latter  approach  was  taken  in  this  implementation. 

Since  in  the  case  of  the  deconvolution  network,  the  resistive  interconnection  matrix 
is  a  discretization  of  a  convolution  kernel,  it  is  reasonable  to  try  to  preserve  the  shape 
of  the  connectivity  profile.  That  is,  it  is  the  ratiometric  relationship  of  the  connective 
weights  that  is  truely  important  since  any  other  variations  correspond  to  a  simple  scaling 
of  the  inputs  or  outputs.  For  the  boxcar  convolution  kernel  considered  here  preservation 
of  the  shape  of  the  connectivity  profile  corresponds  to  the  requirement  that  all  nonzero 
connections  should  be  equal.  In  an  attempt  to  preserve  the  shape  of  the  connectivity 
profile  in  the  face  of  processing  induced  variations,  the  connective  weights  are  quantized 
into  quanta  of  discrete  resistive  elements.  Figure  18  shows  the  layout  of  the  resistive 
matrix.  A  discrete  resistive  element  is  formed  by  etching  a  5 pm  X  5pm  opening  in  the 
oxide  layer  between  two  metal  wires  and  then  evaporating  silicon  into  the  opening  (see 
Figure  17).  Amorphous  silicon,  therefore  forms  the  resistive  material.  Each  such  discrete 
resistor  has  a  resistance  value  that  depends  on  the  thickness  of  the  oxide  layer  and  the 
area  of  the  opening.  For  the  oxide  thickness  used  (and  5 pm  x  5pm  openings)  the  value  of 
a  single  discrete  resistor  was  measured  to  be  approximately  35KII.  A  wire  grid  is  formed 
using  the  first  metal  layer  to  construct  parallel  wires  in  one  direction  and  the  second 
metal  layer  to  construct  wires  in  a  direction  orthogonal  to  the  wires  formed  in  the  first 
metal  layer.  At  each  point  on  the  grid,  a  connective  weight  is  defined  by  placing  a  number 
of  discrete  resistors  of  the  type  described  at  that  point.  The  strength  of  connection  at 
any  point  on  the  grid  is  determined  by  the  number  of  discrete  resistors  placed  there. 
Ratiometric  relationships  between  elements  of  the  connection  matrix  should  for  the  most 
part  be  preserved  in  this  approach  since  the  ratio  of  two  connective  weights  is  primarily 
determined  by  the  ratio  of  the  number  of  discrete  resistors  forming  each  connection  Any 
variations  in  the  ratios  are  the  effects  of  phenomena  such  as  nonuniform  oxide  thickness 
over  the  area  of  the  die,  nonuniform  deposition  of  amorphous  silicon  which  are  relatively 
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Figure  17:  A  discrete  resistive  element  as  used  in  the  resistive  matrix  chip. 


small  effects. 

6.3  Cosine  Transformation  in  VLSI 

As  discussed  earlier,  the  Cosine  Transform  of  the  sampled  signal  is  necessary  in  order  to 
perform  the  task  of  classification.  It  is  observed  that  that  the  Discrete  Hartley  Transform 
(DHT)  of  a  vector  x  of  samples  of  a  signal  may  be  obtained  as  the  result  of  multiplication 
by  the  inverse  of  the  self-adjoint  matrix  D ,  i.e. 

X  =  D~lx  =  jDx,  (6.4) 

where  X  is  the  Discrete  Hartley  transform  of  a:.  It  is  also  observed  that  the  Cosine 
Transform  may  be  obtained  by  a  simple  average  of  the  components  of  the  DHT  (see 
equation  3.9). 

In  [2]  one  approach  to  implementing  the  DHT  using  an  analog  network  is  discussed. 
However,  the  approach  taken  in  [2]  involves  a  recurrent  network  in  which  dynamics  govern 
the  speed  of  convergence  as  in  the  MEDN.  Also  the  implementation  in  [2]  requires  the 
implementation  of  a  large  number  of  amplifiers.  We  consider  a  simpler  approach  here 
in  which  we  perform  the  vector  matrix  multiplication  required  for  the  DHT  as  in  the 
convolution  network  described  earlier.  As  observed  in  [2],  symmetry  of  the  cas  function 
requires  the  fabrication  of  only  n/4  distinct  values  of  conductance  in  order  to  implement 
the  matrix  D.  An  approach  similar  to  the  one  described  earlier  may  be  used  to  fabricate 
the  resistive  array  in  a  quantized  manner.  Once  the  DHT  has  been  obtained,  the  Cosine 
Transform  may  be  derived  from  this  by  using  an  array  of  summing  amplifiers  (see  equation 
(3.9)).  This  approach  results  in  an  implementation  with  minimal  time  delay  involved  in 
computing  the  transform. 

6.4  Analog  VLSI  Circuit  Implementation  Of  The  Classifier  Network 

The  network  which  we  consider  here  for  the  final  classification  of  the  input  signal  is  one 
which  conforms  to  the  standard  model  for  a  pattern  classifier  feedforward  neural  network 
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Figure  18:  Layout  of  resistive  interconnection  matrix  chip. 
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(see  [4]).  Analog  VLSI  implementation  of  such  layered,  feedforward  neural  networks  is 
a  topic  of  interest  to  great  many  researchers  in  the  field  and  a  number  of  successful 
implementations  of  varying  complexity  have  been  designed  and  fabricated  (see  e.g.  [12], 
[13],  [14],  and  [15]). 

Essential  to  the  implementation  of  an  adaptive  feedforward  neural  network  is  the  de¬ 
sign  and  fabrication  of  programmable  synapses.  We  use  the  term  programmable  synapse 
to  describe  a  connection  between  two  nodes  in  which  the  connection  weight  can  be  varied. 
In  terms  of  analog  circuitry,  a  programmable  synapse  may  be  described  as  a  transcon¬ 
ductance  amplifier  with  variable  gain.  It  is  also  essential  that  circuit  complexity  of  a  pro¬ 
grammable  synapse  be  minimal.  In  [16]  a  hybrid  digital-analog  array  of  programmable 
synapses  is  described  which  has  been  fabricated  and  is  currently  being  tested  at  the 
Naval  Research  Laboratories  in  Washington  D.C..  As  will  be  discussed  later,  the  partic¬ 
ular  digital- analog  design  in  [16]  is  amenable  to  the  overall  network  configuration  which 
we  consider. 

One  major  obstacle  in  VLSI  implementation  of  a  feedforward  neural  network  is  that 
as  the  network  grows  in  size  (number  of  nodes  and  layers),  full  interlayer  connectivity  4 
is  intractable  in  terms  of  both  allocation  of  silicon  real  estate  and  routing  of  the  wires 
required  to  implement  connections.  In  this  section  we  discuss  a  VLSI  design  strategy 
and  architecture  which  permits  a  great  deal  of  flexibility  with  respect  to  the  number 
of  layers  in  the  network  while  maintaining  full  connectivity  and  programmability  of  the 
connections.  We  call  the  architecture  which  we  describe  here  STAACNNETiox  STackable 
Adaptive  Analog  Circuit  Neural  NETwork  . 

6.4.1  STAACNNET 

Among  the  more  appealing  attributes  of  neural  network  architectures  is  that  the  networks 
are  composed  of  a  large  number  of  essentially  identical  elements.  In  the  case  of  the  clas¬ 
sifier  network  which  we  consider  here,  the  network  is  composed  of  a  number  of  identical 
layers5.  Each  layer  is  composed  of  a  number  of  identical  nodes  followed  by  a  complete 
set  of  connections  to  the  next  layer  of  nodes.  In  VLSI  implementation  of  multilayer, 
feedforward  neural  networks,  this  uniformity  may  be  exploited  to  allow  implementation 
of  large  networks  as  a  cascade  of  less  complex  elements. 

For  implementation  of  the  classifier  network  ,  we  propose  an  architecture  of  the  form 
depicted  in  Figure  19.  In  this  architecture,  a  single  layer  of  nodes,  together  with  a 
complete  set  of  connections  to  the  next  layer,  is  implemented  on  a  single  chip.  Hence  if 
Kn  is  the  number  of  nodes  in  a  layer,  a  single  chip  would  incorporate  Kjg  nodes  (with 
inputs  provided  to  each  of  these  nodes),  K%  programmable  connections,  and  Kn  outputs 
to  be  connected  to  the  inputs  of  the  next  layer.  Each  chip  would  be  packaged  individually 
and  fitted  with  a  connector  which  connects  to  the  pins  on  the  package  and  also  provides 
a  socket  into  which  can  be  placed  a  second  such  chip.  Since  the  connections  weights  are 
programmable  it  is  necessary  to  be  able  to  select  a  particular  connection  whose  weight  is 
to  be  modified.  If  If;  is  the  maximum  number  of  layers  we  are  to  implement  in  a  single 

4  Full  interlayer  connectivity  in  this  case  refers  to  the  fact  that  every  node  in  a  layer  is  connected  to 
every  node  in  the  layer  preceding  and  the  layer  following  it. 

5  A  layer  with  a  smaller  number  of  nodes  may  be  implemented  by  simply  setting  appropriate  connec¬ 
tions  to  zero. 
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Figure  19:  Implementation  architecture  for  classifier  network 
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network,  we  require  log2  Ki  address  lines  to  select  a  layer,  and  2  log2(A%/)  address  lines 
to  select  a  connection  within  a  selected  layer.  Hence  the  total  number  of  pins  required 
on  a  package  is6 

+  l°§2(-^Ar)  +  l°g2  (Ki)  +  4. 

For  example  if  we  would  like  to  use  20  nodes  in  a  layer  and  allow  up  to  four  layers  in 
the  network,  55  pins  are  required  per  chip.  Note  that  the  physical  stacking  of  the  chips 
is  just  a  convenient  feature  and  not  a  necessity  of  this  architecture,  since  an  equivalent 
configuration  may  be  obtained  by  simply  externally  connecting  the  appropriate  pins. 

Training  algorithms  such  as  the  backpropagation  algorithm  would  be  implemented  in 
a  microprocessor  environment  from  which  individual  connection  weights  could  be  selected 
and  updated. 

The  advantages  of  this  structure  are  the  following:  (i)reduced  density  requirements 
for  individual  chips,  (ii)experimentation  with  number  of  layers  and  number  of  nodes  in 
different  layers  is  facilitated,  and  (iii)full  programmability  of  the  connection  patterns  and 
flexibility  with  learning  rules  since  training  algorithms  are  microprocessor  controlled. 


7  Discussion  and  Conclusions 


We  have  proposed  an  analog  neural  network  model  of  a  real-time  system  for  the 
decomposition  of  superposed  radar  returns  in  the  presence  of  noise.  This  system  consists 
of  three  distinctly  identifiable  processes  which  correspond  to  the  operations  of  noise 
reduction,  data  transformation,  and  classification  respectively.  Noise  reduction  via  a 
non-linear  neural  network  model  exhibits  highly  robust  performance  against  noise  level 
(signal  to  noise  ratio)  in  the  input.  It  has  further  been  shown  that  the  neural  network 
nonlinear  noise  filtering  is  far  superior  to  a  simple  low  pass  filter  implementation. 

Preliminary  simulations  favorably  indicate  the  success  of  the  proposed  architecture. 
Furthermore,  the  invocation  of  existing  optimization  based  design  tools  promises  to  im¬ 
prove  the  performance  of  such  a  system.  Most  applications  of  neural  networks  for  clas¬ 
sification  tasks  have  been  previously  restricted  to  binary  decision  regions  (simple  yes/no 
answers  to  an  arbitrary  number  of  hypotheses).  Here,  we  have  demonstrated  a  neural 
architecture  capable  of  performing  classification  on  a  continuous  decision  space.  Such 
classification  has  been  demonstrated  to  perform  to  a  high  degree  of  accuracy. 

Some  of  the  most  significant  benefits  of  the  proposed  system  are  a  result  of  imple¬ 
mentation  of  the  system  as  an  analog  VLSI  circuit.  Foremost  among  these  benefits  is 
processing  speed.  A  breadboard  prototype  of  the  MEDN  [7]  has  demonstrated  the  net¬ 
works  capability  to  converge  to  solutions  in  under  10/zsec.  VLSI  implementation  of  the 
MEDN  is  underway  at  the  Naval  Research  Lab  in  Washington  D.C. 

®The  additional  4  pins  are  required  for  power  supply,  ground,  and  programming  voltage  for  the 
connection  weights. 
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APPENDIX  A:  Simulation  Runs 
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